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Abstract — In this letter, we develop a novel network-wide 
traffic analysis methodology, named Stable Principal Compo- 
nent Pursuit with Time-Frequency Constraints (SPCP-TFC), 
for extracting the baseline of a traffic matrix. Following a 
refined traffic matrix decomposition model, we present new time- 
frequency constraints to extend Stable Principal Component 
Pursuit (SPCP), and design an efficient numerical algorithm 
for SPCP-TFC. At last, we evaluate the SPCP-TFC method by 
abundant simulations, and show it has superior performance than 
other traffic baseline methods such as RBL and PCA. 

Index Terms — baseline of traffic matrix, robust PCA, time- 
frequency constraint, numerical algorithm, simulation. 

I. Introduction 

THE Internet traffic data, which is usually modeled by 
the superposition of diverse components fTlf^lPI, is 
of critical importance to network management. The baseline 
traffic represents the most prominent traffic patterns [21, and 
is helpful to anomaly detection, capacity planning, and green 
computing. Most traditional studies focused on baselining the 
single-link traffic ID, however, as the network- wide traffic 
analysis is becoming increasingly popular, baselining the 
network-wide traffic turns into a most urgent problem. 

The traffic matrix is an important kind of network-wide 
traffic data and has complex characteristics. The baseline of 
a traffic matrix should capture the common patterns among 
traffic flows, and be stable against the disturbance of large 
anomalies. Principal Component Analysis (PCA) was used 
for traffic matrix analysis in and showed the low-rank 
nature of the baseline (i.e. the deterministic) component, but 
it performed poorly in the presence of contamination by 
large anomalies |5|. Recently, the Robust Principal Component 
Analysis (RPCA) theory ||6| has attracted wide attentions. Sup- 
posing a low-rank matrix is contaminated by a sparse matrix 
whose non-zeros entries may have large magnitudes, Candes 
et al. |6| presented a robust matrix decomposition method 
named Principal Component Pursuit (PCP), and proved that 
under surprising board conditions, PCP could recover the low- 
rank matrix with high probability. Following this exact "low- 
rank and sparsity" assumption, Bandara et al. (2] proposed 
the Robust BaseLine (RBL) method, and argued that RBL 
performs better than several existing traffic baseline methods. 

Intuitively, the baseline traffic time-series of each flow 
should be "smooth" enough, i.e. it records the long-term and 
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Steady traffic trends (such as the diurnal pattern) and neglects 
the short-term fluctuations. However, to the best of our knowl- 
edge, this problem has not been adequately studied when 
baselining the traffic matrix. Actually, the resulting baseline 
traffic time-series for RBL are not smooth enough, and it can 
be explained that the empirical traffic matrix does not exactly 
meet the "low-rank and sparsity" assumption, on the contrary, 
it also contains non-smooth noisy fluctuations which should be 
decomposed specially. Furthermore, the evaluation of traffic 
baseline methods was not sufficient in previous studies. A 
key point is that obtaining the ground-truth baseline of real- 
world traffic data is usually impossible, consequently, one 
could neither measure the accuracy of a baseline method, 
nor compare the peiformances of different baseline methods. 
Therefore, baselining network-wide traffic is still a challenging 
task. 

In this letter, we first establish a refinement of the traffic ma- 
trix decomposition model in |3|, which extends the assumption 
of deterministic traffic matrix to characterize the smoothness of 
baseline traffic. Secondly, we propose a novel baseline method 
for the traffic matrix data, which solves a constrained convex 
program named Stable Piincipal Component Pursuit with 
Time-Frequency Constraints (SPCP-TFC). As an extension of 
the Stable Principal Component Pursuit (SPCP) method |7|, 
SPCP-TFC takes distinctive time-frequency constraints for our 
refined model. Thirdly, we design the Accelerate Proximal 
Gradient (APG) algorithm for SPCP-TFC which has a fast 
convergence rate. At last, we evaluate our traffic baseline 
method by abundant simulations, and show it has a superior 
performance than RBL and PCA. 

II. Methodology 

A. A Refined Traffic Matrix Decomposition Model 

Suppose X e M^^^ is a traffic matrix, and each column 
Xj € (1 < j < P) is a traffic flow in T time intervals. A 
traffic flow is the traffic of a Original-Destination (OD) pair 
[IJ, or the traffic traversing a unidirectional backbone link |2J. 
In lO, we proposed the simple Traffic Matrix Decomposition 
Model (TMDM), assuming X is the sum of a low-rank matrix, 
a sparse matrix, and a noise matrix. This model is equivalent 
to the data model of the generalized RPCA problem |7|, 
and the low-rank deterministic traffic matrix corresponds to 
the baseline trafficj^ However, TMDM gives no temporal 
characteristics of the baseline traffic. Since each baseline traffic 
flow records the long-term and steady traffic trends, it should 
display a smooth curve. Plenty of mathematical tools, such as 

' In the following discussion, the words "deterministic traffic" and "baseline 
traffic" are used interchangeably. 
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wavelets and splines, can formulate smoothness. As the most 
salient baseline traffic patterns are slow oscillation behaviors, 
this letter formulates the baseline traffic time-series as the sum 
of harmonics with low frequencies, and establishes a Refined 
Traffic Matrix Decomposition Model (R-TMDM): 
Definition 1 (R-TMDM) The traffic matrix X eR^""^ is the 
superposition of the deterministic (baseline) traffic matrix A, 
the anomaly traffic matrix E, and the noise traffic matrix N. A 
is a low-rank matrix, and for each column time-series in A, the 
Fourier spectra whose frequencies exceed a critical value fc 
are zeros; E is a sparse matrix, but the non-zeros entries may 
have large magnitudes; N is a random noise matrix, and each 
column time-series is a zero-mean stationary random process. 

For simplicity, we assume each column Nj (1 < j < P) is 
the white Gaussian noise with variance cr| > in this letter. 

B. Stable Principal Component Pursuit with Time-Frequency 
Constraints 

Stable Principal Component Pursuit (SPCP) for the gener- 
alized RPCA problem solves this convex program Q: 



minimize 1 1 AIL - 

A,E,N 

s.t. A + E + N 



M\E\\i 
-X, \\N\\ 



l<5, 



(1) 



where ||.||*, \\.\\f denote the nuclear norm, the li 

norm, and the Frobenius norm, respectively; A > is the 
balance parameter, and (5 > is the constraint parameter. 
To extract the baseline traffic more powerfully, we extend 
SPCP by preserving the objective function and redesigning 
the constraint functions. 

Firstly, considering the R-TMDM model, it is necessary 
to add a constraint for the baseline traffic matrix based on 
its frequency-domain assumption. W — [Wo ■ ■ • W^t-i]txt 
denotes the discrete Fourier basis matrix of length T. For each 
< /s < T — 1, the Fourier basis Wk is defined as 

Wk{t) = ^e-^'-^^'-^l t^l,...,T, (2) 



T 

with frequency fk = inm{^, ^^^}. Wh is made up of the 
high-frequency bases Wk satisfying fk > fc, thus M^h(W^h)^ 
is the projection matrix to high-frequency subspace. We use 
the following constraint for the baseline traffic matrix: 



Wh{WhVA = Otxp. 



(3) 



Secondly, unlike the Frobenius norm inequality in ([T]i, we 
use a different constraint strategy for the noise traffic. For each 
column vector Nj (1 < j < P), consider its periodogram 
{lNj{fk)}^Zl fo"" Fourier frequencies {fk} 



T- 
k=0 ■ 



Ht-i) 



t=i 



(4) 



xi iXs) denotes the Chi-square (Chi) distribution with s 
freedom degree, and it is well-known (SI that when A'^ 
are i.i.d. standard Gaussian variable^ {'^lNj{fk)}J^Zi 



^As the preprocessing for X, for each column vector Xj (1 < j < P), 
we divide it by crj. This operation normalizes the Gaussian variables in N, 
and preserves all the hypotheses of A and E in the R-TMDM model. 



i.i.d X2 variables, and Ipf. (/o) is a Xi variable. Therefore, 
{V2\Wf[ NjWlz^ are i.i.d X2 variables, and \Wj Nj\ is a xi 
variable. We choose these time-frequency constraints for the 
noise traffic matrix: 




Wt-iVNj\ 



<<5i, 



j = l,...,P. (5) 



(6) 



Equivalently, A^ lies in the convex subset B{Si,62,S3): 

A^ G M^''-^ and satisfies 
(H with 61,62, S3 > 

The main idea of (jSj) is to eliminate the low-frequncy baseline 
traffic from the resulting noise traffic by utilizing the value 
distribution of its periodogram, while the Frobenius norm con- 
straint in ([T} could not filter the baseline traffic so efficiently. 

Lastly, using the new constraints ([3]) and (j5]l, we propose our 
traffic baseline method, named Stable Principal Component 
Pursuit with Time-Frequency Constraints (SPCP-TFC): 



\M* + M\E\\i 



minimize 

A.E,N 



S.t. A + E + N = X, 
N e 3(61,62,63), 



Wu{WuVA^Otxp, 



(7) 



which outputs A as the baseline traffic matrix. In this study, 
we set ^1 — 3.03, 62 — 63 ~ 2.56, corresponding to the 99% 
confidence intervals of the X2, Xi, ™d Gaussian variables, 
respectively. 

C. Numerical Algorithm 

SPCP-TFC solves a constrained program which is compu- 
tational expensive. The Accelerate Proximal Gradient (APG) 
algorithm considers this relaxed unconstrained program: 

minimize E, N) = fig(A, E, N) + f(A, E, N) 

f(A, E,N)^^-\\X-A-E~N\\l + ^WW^iW^^AWl 
g(A,E, N) ^ \\A\U + X\\E\\, + iBiSuS^M)^ 

(8) 

The function f(A,E,N), which is convex and differentiable, 
penalizes violations of the two equality constraints in (j7|i, 
and /3 > is a balancing parameter. g{X, E, N) is a linear 
combination of three convex but non-differentiable functions, 
/i > is a relaxation parameter, as /i \ 0, the solution to ([8]) 
approaches to the solution to (j?]). V/ denotes the gradient of 
/, which equals to 

" [I + l3Wii{WiiV]A + E^ 
\/f(A,E,N)^ A + E + N-X 

A + E + N - X 



N -X 



and is Lipschitz continuous by the following proposition. 
Proposition 1: V f(A, E, N) is Lipschitz continuous, and the 
Lipschitz constant is L = ^79 + 4/3 + 
For the proof, see the appendix section. 

Instead of directly minimizing F(A,E,N), the APG al- 
gorithm minimizes a sequence of quadratic approximations. 
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denoted as Q{A,E,N,Y'^,Y^ ,Y^), of F(A,E,N) at a 
specially chosen point {Y^^Y^ ^Y'^) (renewed in each step): 

Q{A,E,N,Y'^,Y'^,Y^) 
=MA,E,N) + fiY^,Y^,Y'') 

{Vf{Y^,Y^, r^), {A, E, N) ~ {Y^^Y^^Y"")) + 



Algorithm 1 APG for SPCP-TFC 



-\\{A,E,N)-{Y^,Y^,Y'') 

It can be derived that 

minimize Q{A,E,N,Y^,Y^,Y 



|2 

If ■ 



Input: ti-affic matrix X e M^^^. 
Normalization: X = X/diag{crj}. 

Initialization: A^ ^ A_i = E^^ = E_i ^ Nq ^ N_i ^ Q; 
to = t_i = 1; /c = 0. /io = 0.99||X||2; JI ^ lO^Vo; V = 0.9. 
L = + 4/3 + /32; A = 1/Vmax(r, P). 
While not converged do 



-- minimize 



L. 



\A~G 



A\\2 
F 



L. 



minimize < /iA||i?||i H \\E - 

E \ 2 



G 



E\\2 
F 



L. 



mmmiize <j ,53) (^) + 2 1 1 ^ " 



G 



N\\2 
F 



Constant 
(10) 



where 

G^ 



Y' 



Y 



N 



Thus (10 1 splits into three subproblems, which are equivalent 
to the proximity operators associated with the convex functions 
i^WEh, and iiB{S^M,s4N), respectively. Using 
basic results in 111], they have explicit solutions as follow: 



argmin ^|| A| 



^\\A-G^rF\ = us^[nv^, 



argmin <i/iA|l£;|li + -|l£;-G^|||, 



: 5fiA \G^ 



(11) 



(12) 



\N -G 



N\\2 
F 



(13) 



argnnn jA"-s(5i,d-2,53)(^) " 

where UYy^ is the singular value decomposition (SVD) of 
G^, 'Pb{Si,S2,S3) '^he projection operator to B{Si, 62,63), 
and Sf is the soft-thresholding operator with parameter e 13]. 
For each X e M^^-^, e M.'^''^ and it satisifies 

5, [X] (z, j) = sign(X(z, j)) max{|X(i, i)| - e, 0}. 

Algorithm [T] presents the APG algorithm for SPCP-TFC, 
moreover, the following theorem proves that Algorithm [T] has 
the 0{l/k^) convergence rate. Since the proof of this theorem 
is quite close to Theorem 4.4 in ifTOl . we omit it and directly 
summarize this result: 

Theorem 1 Let F{A,E,N) = Jig{A,E,N) + J{A,E,N). 
Then for all k > ko — log /log (^)j' '^"^^^ 

2 



F{Xk)-FiX*) < 



2^/9TWTW\\Xko-X*\\ 



(14) 



(fc-fco + l)2 

where Xk = {Ak,Ek,Nk) is defined in Algorithm^ and 
X* — {A* , E* , N*) is a solution to when ji^Jl. 



Yi^ = Eu+'-^{E,-E,^,); 



vN 



y^ 



■y^ 



X) 

X); 



{U, S, V) — svd[G^]; // Singular Value Decomposition. 

Ak+i - USi,[S]V^; Ek+i - 5m [Gf]; 

Nk+i = 'Pb{Si S2,S3)[G k ]; 

ifc+i = (1 + V4<2 + l)/2; ^fe+i = max(?7/ife,7l); 
fc = fc + 1. 
End while 

Output: A = Ak ■ diag{crj} is the baseline traffic matrix. 



III. Simulation Results 

We evaluate SPCP-TFC by simulated traffic matrices, be- 
cause the simulation approach could provide the ground-truth 
baseline data, perform abundant experiments independently, 
and change parameters to verify an algorithm's stability. 
Consequently, we first introduces the simulation techniques. 

Suppose the network has 10 nodes, the measurement win- 
dow is one week, and the minimal time interval is At = 5 
minutes. Under this setting, each traffic matrix X e M^^^ 
has P = 192 = 100 OD flows in T = 7 x 24 x 12 = 2016 
time intervals, and it is generated by the following three traffic 
components. (1) For each flow j, the baseline traffic Aj is the 
sum of a content a^.o and a few sin functions: 



M 

flj,™ sin 



2Trl„it 
T 



<y3j,m ,j = 1,...,P. 



We assume that in each simulation, {aj,o}f^^ satisfies the 
gravity model and 22j=i ^jfi = 10 . Since the period of the m- 
th sin is Y^Xj^ hours, we choose {Im} = {7, 14, 28, 56, 112}, 
and the periods of oscillation traffic are {24,12,6,3,1.5} 
hour^ Moreover, the phase parameters {(pj.m} are uniform 



distributed in 



5 ' 5J 



and the amplitudes of the sin functions 



decay quickly as m increases by aj^„i+i = 0-5aj^m- (2) The 
non-zero entries are randomly distributed in 1% positions of 
the anomaly traffic matrix E, and for each flow j, the volume 
of each anomaly is O.Sflj.o. (3) For each flow j, the noise traffic 
Nj consists of T independent Gaussian variables N{0,(jj), 
where aj = aaj^, a > controls the noise rate. 

We simulate 100 baseline traffic matrices simultaneously, 
and generate one anomaly traffic matrix and two noise traffic 
matrices for each of them (a is chosen as 0.1 and 0.2, which 

^This setting is based on the discussions on baclcbone traffic in [lj||5j, for 
simplicity, we neglect periodical traffic patterns longer than 24 hours. 
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Fig. 1. The NRMSEs between the resulting and the ground-truth basehne 
traffic matrices. The bar shows the median and the error bar shows 10% and 
90% percentile. Left group: q = 0.1; right group: a = 0.2. 
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Fig. 2. Boxplots of the Pearson Correlation Coefficients between resulting 
and ground-tratli baseline traffic flows. The results of SPCP-TFC are enlarged 
in small plots. Left panel: a = 0.1; right panel: a = 0.2. 



means a low-level noise and a high-level noise, respectively). 
Therefore, 200 simulated traffic matrices are used in exper- 
iment. It can be verified that these matrices satisfy the R- 
TMDM model, and the critical frequency fc = SPCP- 
TFC is compared with other two network-wide traffic baseline 
methods: RBL |2| and PCA |T|. We apply these methods to 
the simulation dataset, and evaluate them by three metrics. 

The first metric is the Normalized Root Mean Squared Error 
(NRMSE) between the ground-truth baseline traffic matrix A 
and its estimation A: 



NRMSE = \\A- A\\f/\\A\\f. 



(15) 



Fig. [T] illustrates the NRMSEs of different baseline methods 
under two noise levels. For SPCP-TFC, we test different values 
of parameter /3 G {0.1, 1, 10, 25, 50}. It can be observed that, 
SPCP-TFC archives significant lower NRMSEs than RBL and 
PCA, and this result is stable in a large range of f3 values. And 
our traffic baseline method shows the best performance when (3 
= 25: under the low (high) noise level, the median of NRMSEs 
for SPCP-TFC is as 30.3%(27.3%) as that for RBL, and is 
as 19.5%(15.9%) as that for PCA. Hence we only consider 
SPCP-TFC with this fixed value in the following discussions. 

The second metric characterizes the temporal correlation 
between each pair of ground-truth and resulting baseline traffic 
flows using their Pearson Correlation Coefficient. As a traffic 
matrix contains 100 flows, under each noise level, we compute 
100 X 100 = lO'' correlation coefficients for each baseline 
method. We display boxplots of these coefficients in Fig. [2] 
It is shown that the correlation coefficients for SPCP-TFC 
are very close to 1, and have an obviously greater median 




o 
E 



Traffic Matrix ID 




Traffic Matrix ID 



Fig. 3. The smoothness of resulting and ground-truth baseline traffic matrices. 
An enlarged view of SPCP-TFC's smoothness is shown in the bottom panel. 

TABLE I 

A COMPARISON BETWEEN DIFFERENT BASELINE METHODS ON THE 
NORMALIZED MEAN SMOOTHNESS VALUES 



a 


SPCP-TFC 


RBL 


PCA 


Ground-Truth 


0.1 


0.99 


3.72 


8.10 


1.00 


0.2 


0.95 


6.48 


14.78 


1.00 



than RBL and PCA. Thus the SPCP-TFC method extracts the 
temporal characteristics of the baseline traffic more precisely. 

The third metric characterizes baseline traffic on smooth- 
ness, which adds up the Total Variations of the traffic flows in 
baseline traffic matrix. A small value of this metric indicates 
the traffic is sufficient smooth. Applying this metric to the 
baseline results aforementioned, we display the curve of 100 
smoothness values (arranged by traffic matrix ID) for each 
baseline method, and for the 100 ground-truth values, in Fig. 
|3] We then compute the mean smoothness of each baseline 
method, and normalize it by dividing the mean smoothness of 
the ground-truth baseline traffic matrices. These normalized 
mean smoothness values (under two noise levels) are summa- 
rized in Tab. |l] The estimated baseline traffic matrices by RBL 
and PCA are significantly more coarse than the ground-truth 
baseline traffic, and their mean smoothness values are larger 
than three times and eight times of the ground-truth value, 
respectively. Instead, the SPCP-TFC method leads to more 
accurate baseline estimations on smoothness. This is because 
the high-frequency traffic, which has larger total variation, can 
be successfully eliminated from the baseline by SPCP-TFC. 

From Fig. [T|3] and Tab. |l] we can also observe that when 
the noise rate a grows from 0.1 to 0.2, the performance of 
our baseline method shows a decline: the NRMSEs raise, 
the correlation coefficients drop down, and the smoothness 
values of resulting baseline traffic diverge from the ground- 
truth more evidently. This is mainly because the low-frequency 
component of the noise traffic, which could not be directly 
distinguished from the baseline traffic, is proportional to a. 

IV. Conclusions 

In this letter, we propose a refined traffic matrix decompo- 
sition model, and introduce a novel baseline method for traffic 
matrix named SPCP-TFC, which extends SPCP by using new 
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constraint functions for the baseline traffic and the noise traffic, 
respectively. We design an APG algorithm for SPCP-TFC, 
whose convergence rate is 0(1/ k'^). Our baseline method is 
demonstrated by abundant simulations. Using three distinct 
metrics to evaluate the baseline results, we show SPCP-TFC 
has a superior performance than RBL and PCA. 
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Appendix A. Proof of Proposition 1 

Proof: For any two points {Ai, Ei, Ni) and {Ai,Ei,Ni) 
in R^''^ X X R'^^'P, we have 

\\Vf{Ai,E,,Ni)^VfiA2,E2,N2)\\l 
= II [/ + /3Wh(Wh)^](Ai - A2) + (El - E2) + {Ni - N2)\\l 

+ 2 \\{Ai - A2) + [El - E2) + [Ni - N2)\\l 
<(/3 ||W^h(W^h)^(^i - ^2)11^ + \\Ai-A2\\f + \\Ei -E2\\f 

+ \\Ni-N2\\f)'+ 
2(Pi - A2\\f + \\Ei- E2\\f + \\Ni - N2\\f)' 
=/32||w^h(W^h)^(Ai-A2)||^ + 
2/3( ||W^h(W^h)^(Ai - A2)fp \\Ai - A2\\f+ 

||W^h(M^h)^(^i - ^2)||^ Pi - E2\\f+ 
\\Wn{WnV{A^ - A2)fp \\N^ ~ N2\\f) + 

3(||Ai-A 



2\\f 



-E2rF + \\N, ~N, 



2\\f 



6[\\Ai - .42||i.||i;i -E2\\f + \\Ai~ A2\\f\\Ni ~ N2\\f 

+ \\Ei-E2\\f\\Ni-N2\\f) 

<(3/? + WWaiWaViAi - A2)\\l + 
(9 + -A2\\l + \\Ei-E2\\l + \\Ni - N2\\ 



(16) 



As Wh(Wh)^ is a projection operator in R'^, for each p E 
{1,...,P}, 

\\Wii{Wn)'^[A,{p) - A2{p)]\\ < \\A,ip) - A2{p)\\ . (17) 
therefore, we have 

||m^h(i^h)^(^i-A2)||^ = 

p 

= Y.\\Wh{WuV[A,{p) ~ A2{p)]\\' 



p 



(18) 



<Y,\\A,{p)~A2ip)f^\\A,-A2\ 



Add ( 18 1 to ( 16 1 and let L = ^9 + 4/3 + (3^, we have 

\\Vf{Ai,E,,Ni)^VfiA2,E2,N2)\\l 
<(9 + 4/3 + /32) - A2\\l + \\Ei ~ E2\\l + \\Ni - 7V2II 
=L\\iAi,Ei,Ni)-iA2,E2,N2)\\l. 



(19) 



This completes the proof. □ 



